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Abstract. This paper is concerned with nonparametric estimation of the Levy density 
of a pure jump Levy process. The sample path is observed at n discrete instants with fixed 
sampling interval. We construct a collection of estimators obtained by deconvolution 
methods and deduced from appropriate estimators of the characteristic function and its 
first derivative. We obtain a bound for the L 2 -risk, under general assumptions on the 
model. Then we propose a penalty function that allows to build an adaptive estimator. 
The risk bound for the adaptive estimator is obtained under additional assumptions on 
the Levy density. Examples of models fitting in our framework are described and rates 
of convergence of the estimator are discussed. June 20, 2008 
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1. Introduction 

In recent years, the use of Levy processes for modelling purposes has become very pop- 
ular in many areas and especially in the field of finance (see e.g. Eberlein and Keller 

(1995) , Barndorff-Nielsen and Shephard (2001), Cont and Tankov (2004); see also Bertoin 

(1996) or Sato (1999) for a comprehensive study for these processes). The distribution of 
a Levy process is usually specified by its characteristic triple (drift, Gaussian component 
and Levy measure) rather than by the distribution of its independent increments. Indeed, 
the exact distribution of these increments is most often intractable or even has no closed 
form formula. For this reason, the standard parametric approach by likelihood methods 
is a difficult task and many authors have rather considered nonparametric methods. For 
Levy processes, estimating the Levy measure is of crucial importance since this measure 
specifies the jumps behavior. Nonparametric estimation of the Levy measure has been 
the subject of several recent contributions. The statistical approaches depend on the way 
observations are performed. For instance, Basawa and Brockwell (1982) consider non de- 
creasing Levy processes and observations of jumps with size larger than some positive e, or 
discrete observations with fixed sampling interval. They build nonparametric estimators of 
a distribution function linked with the Levy measure. More recently, Figueroa-Lopez and 
Houdre (2006) consider a continuous-time observation of a general Levy process and study 
penalized projection estimators of the Levy density based on integrals of functions with 
respect to the random Poisson measure associated with the jumps of the process. However, 
their approach remains theoretical since these Poisson integrals are hardly accessible. 
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In this paper, we consider nonparametric estimation of the Levy measure for real- valued 
Levy processes of pure jump type, i.e. without drift and Gaussian component. We rely on 
the common assumption that the Levy measure admits a density n(x) on M and assume 
that the process is discretely observed with fixed sampling interval A. Let (Lt) denote the 
underlying Levy process and (Zfc = L^a — ^(fe-i)A) k = 1, . . . ,n) be the observed random 
variables which are independent and identically distributed. Under our assumption, the 
characteristic function of = is given by the following simple formula: 

(1) Va(«) = E(expntZf ) = exp (A / {e iux - l)n(x)dx) 



where the unknown function is the Levy density n(x). It is therefore natural to investi- 
gate the nonparametric estimation of n(x) using empirical estimators of the charasteristic 
functions and its derivatives and then recover the Levy density by Fourier inversion. This 
approach is illustrated by Watteel and Kulperger (2003) and Neumann and Reiss (2000). 
However, these authors consider general Levy processes, with drift and Gaussian compo- 
nent. Hence, at least two derivatives of the characteristic function are necessary to reach 
the Levy density. Moreover, the way Fourier inversion is done in concrete is not detailed 
in these papers. In our case, under the assumption that L \x\n(x)dx < oo, we get the 
simple relation: 



(2) g*(u) = J e lux g{x)dx 



V4(u) 



AVa(u)' 

with g(x) = xn(x). This equation indicates that we can estimate g*(u) by using empirical 
counterparts of t/a (u) and if/* (u) only. Then, the problem of recovering an estimator of g 
looks like a classical deconvolution problem. We have at hand the methods used for esti- 
mating unknown densities of random variables observed with additive independent noise. 
This requires the additional assumption that g belongs to L 2 (M). However, the problem 
of deconvolution set by equation ([2]) is not standard and looks more like deconvolution in 
presence of unknown errors densities. This is due to the fact that both the numerator and 
the denominator are unknown and have to be estimated from the same data. This is why 
our estimator of tp&(u) is n °t a simple empirical counterpart. Instead, we use a truncated 
version analogous to the one used in Neumann (1997) and Neumann and Reiss (2000). 

Below, we show how to adapt the deconvolution method described in Comte et al. (2006). 
We consider an adequate sequence (S m ,m = 1, ... ,m n ) of subspaces of L 2 (M) and build 
a collection of projection estimators (g m ). Then using a penalization device, we select 
through a data-driven procedure the best estimator in the collection. We study the L 2 - 
risk of the resulting estimator under the asymptotic framework that n tends to infinity. 
Although the sampling interval A is fixed, we keep it as much as possible in all formulae 
since the distributions of the observed random variables highly depend on A. 

In Section 2, we give assumptions and some preliminary properties. Section 3 contains 
examples of models included in our framework. Section 4 describes the statistical strategy. 
We present the projection spaces and define the collection of estimators. Proposition 4.1 
gives the upper bound for the risk of a projection estimator on a fixed projection space. 
This proposition guides the choice of the penalty function and allows to discuss the rates of 
convergence of the projection estimators. Afterwards, we introduce a theoretical penalty 
(depending on the unknown characteristic function t/a) and study the risk bound of a false 
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estimator (actually not an estimator) (Theorem 4.1). Then, we replace the theoretical 
penalty by an estimated counterpart and give the upper bound of the risk of the resulting 
penalized estimator (Theorem 4.2). Section 6 gives some conclusions and open problems. 
Proofs are gathered in Section 6. In the Appendix, a fondamental result used in our proofs 
is recalled. 

2. Framework and assumptions. 

Recall that we consider the discrete time observation with sample step A of a Levy 
process Lt with Levy density n and characteristic function given by ([I]). We assume 
that (Lt) is a pure jump process with finite variation on compacts. When the Levy 
measure n(x)dx is concentrated on (0,+oo), then (L t ) has increasing paths and is called 
a subordinator. We focus on the estimation of the real valued function 

(3) g{x) = xn(x), 

and introduce the following assumptions on the function g: 

(HI) J R \x\n(x)dx < oo. 

(H2(p)) For p integer, J" R |:c| p_1 |g(2;)|<ix < oo. 

(H3) The function g belongs to L2(R). 

Note that (HI) is stronger than the usual assumption J (\x\ A l)n(x)dx < +oo, and is also 
a moment assumption for Lf. Under the usual assumption, (H2(p)) for p > 1 implies (HI) 
and (H2(Ar)) for k < p. 

Our estimation procedure is based on the random variables 

(4) Zi = L iA - L(i_i)A,i = 1, • • • ,n, 

which are independent, identically distributed, with common characteristic function iPa(u). 
The moments of Z A are linked with the function g. More precisely, we have: 

Proposition 2.1. Let p > 1 integer. Under (H2)(p), E|Z^| P < oo. Moreover, setting, 
fork = l,...p, M k = J R x k - 1 g{x)dx, we have E(Zf ) = AM b E[(Z X A ) 2 ] = AM 2 + A 2 Mi, 
and more generally, K[(Z A ) 1 ] = A M/ + o(A) for all I = 1, . . . ,p. 

Proof. By the assumption, the exponent of the exponential in ([1]) is p times differentiable 
and, by derivating ip&, we get the result. □ 

Assumption (HI) yields the relation ([2]), which is the basis of our estimation procedure. 
We need a precise control of V'A- For this, we introduce the assumption that, for m n an 
integer to be defined later, the following holds: 

(H4) Vx E M, we have c^(l + x 2 )~ A ^ 2 < \if> A (x)\ < 6^(1 + x 2 )~ A ^ 2 , 

for some given constants C0, and (3 > 0. Note that an assumption of this type is also 
considered in Neumann and Reiss (2007). 

For the adaptive version of our estimator, we need additional assumptions for g: 

(H5) There exists some positive a such that f |g*(x)| 2 (l + x 2 ) a dx < +oo, 

and 

(H6) / x 2 g 2 (x)dx < +oo. 
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We must set independent assumptions for ip^ and g, since there may be no relation at all 
between these two functions (see the examples). Note that, in Assumption (H5), which is 
a classical regularity assumption, the knowledge of a is not required. 



3. Examples. 

3.1. Compound Poisson processes. Let L t = where (N t ) is a Poisson process 

with constant intensity c and (Yi) is a sequence of i.i.d. random variables with density / 
independent of the process (Nt). Then, (Lt) is a compound Poisson process with charac- 
teristic function 

(5) i> t (u) = expct [ (e iux - l)f(x)dx. 

Its Levy density is n(x) = cf(x). Assumptions (Hl)-(H2)(p) are equivalent to E(|Yi| p ) < 
oo. Assumption (H3) is equivalent to J R x 2 f 2 (x)dx < oo, which holds for instance if 
sup^ f(x) < +00 and ~\&(Y 2 ) < +oo. We can compute the distribution of Z A = La as 
follows: 

(6) P z .(dz) = e-^(5 Q (dz) + Y,r(z)^rdz). 

n>l 

We have the following bound: 

(7) 1> IVaHI >e" 2cA . 

On this example, it appears clearly that we can not link the regularity assumption on g 
and (H4) which holds with = 0. 



3.2. The Levy gamma process. Let a > 0, > 0. The Levy gamma process (L t ) with 
parameters (0, a) is a subordinator such that, for all t > 0, L t has distribution Gamma 
with parameters (fit, a), i.e. has density: 



n 



3t 



x>0- 



(8) 7^77T\ X e ~ aX ^ 

T(0t) 

The characteristic function of Z A is equal to: 

A 

(9) Va(«) 



a 



a — m 



The Levy density is n(x) = fix 1 e ax I{ x >o} so that g(x) = 0e ax I{ x >o} satisfies our 
assumptions. We have: 

/ \ ib'\ (u) .0 \ i / \ i o/ A 

(10) ^H = zA^— , |^ A («)| 



4>a(u) a — iu (a 2 + u 2 )P A / 2 
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3.3. Another class of subordinators. Consider the Levy process (Lt) with Levy den- 
sity 

n(x) = cx 5 - 1/2 x~ 1 e-P x l x>0 , 
where (5, (3, c) are positive parameters. If S > 1/2, J +o ° n(x)dx < +oo, and we recover 
compound Poisson processes. If < <5 < 1/2, f^°° n(x)dx = +oo and g(x) = xn(x) 
belongs to L 2 (M)nL 1 (IR). The case 5 = 0, which corresponds to the Levy inverse Gaussian 
process does not fit in our framework. For < 5 < 1/2, we find 

and 

\Mx)\ = exp (-c AT ^ 2 + _f ] [(P 2 + xr {S ~ 1/2)/2 ~ P' (S - 1/2) ] 
It is important to mention that ^a above does not satisfy assumption (H4) since 

(11) |Y>a(*)| ~^+oo K(J3, 5) exp(-cA^±i^^ 5 + 1 /2) 

where K((3,5) = exp (c ^jj^zj^- (3~ < " & ~ 1 / 2 ^ ^ . Thus, it has an exponential rate of decrease. 

3.4. The bilateral Gamma process. This process has been recently introduced by 
Kiichler and Tappe (2008). Consider X, Y two independent random variables, X with 
distribution T(f3, a) and Y with distribution T((3' , a'). Then, Z = X — Y has distribution 
bilateral gamma with parameters (ft, a, (3' ,a'), that we denote by T(/3, a; (3', a'). The 
characteristic function of Z is equal to: 

(12) yH = f— (-nr-Y = ex P( / ( e ™* " ^Mx)dx), 

\a-iuj \a! + iu) J R 

with 

n(x) = x^gix), 

and, for x£R, 

9 {x) =(3e~ ax l (0 , + oo)(,) -/3'e- a ' |:E| l(-oo,o)W- 
The bilateral Gamma process (L^) has characteristic function ipt(u) = ip^u) 1 . 

The method can be generalized and we may consider Levy processes on R obtained by 
bilateralisation of two subordinators. 

3.5. Subordinated Processes. Let (Wt) be a Brownian motion, and let (Z t ) be an 
increasing Levy process (subordinator), independent of (Wt). Assume that the observed 
process is 

L t = W Zt . 

We have 

i/j a (u) = E(e iuLA ) = E(e _ T z *). 
As Zt is positive, we consider, for A > 0, 

A (A) = E(e~ XZA ) = exp ^-A £ (1 - e~ A2: )n z (a;)^ , 
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where nz denotes the Levy density of (Z t ). Now let us assume that gz(x) = xnz(x) is 
integrable over (0, +00). We have: 

r+00 j g— Xx r+00 r\ 

log(i?A(A)) = -A / xnz{x)dx = -A / ( / e~ sx ds)xnz(x)dx 

Jo x J J 

= -A J ^ e~ sx xn z (x)dx^j ds. 

Hence, 

ipA(u) = exp ^-A J (^J e~ sx gz(x)dx^j dsj . 

Moreover, it is possible to relate the Levy density til of (L t ) with the Levy density nz of 
(Z t ) as follows. Consider / a non negative function on R, with f(0) = 0. Given the whole 
path (Z t ), the jumps 5L S = Wz s — Wz s _ are centered Gaussian with variance 5Z S . Hence, 

du 
'y/2n6Z s 



s<t s<t 

t I f(u)du( / +0 °exp(-n 2 /2*)^4=^)) 



This gives til(u) = J +o ° exp (—u 2 /2x) nz ^^ ■ By the same tools, we see that 

/•+00 

\SL s \) = vW^E y/SZl) = t J s/xn z (x)dx. 



S<t 8<t 

Therefore, if the above integral is finite, the process (L t ) has finite variation on compact 
sets and it holds that J R \u\riL(u)du < 00. 

With (Zt) a Levy-Gamma process, gz(x) = [3e~ ax l x> Q. Then J + °° e~ sx (3e~ ax dx = 
(3/ (a + s), and 

/ \ A /3 

a \ 



ipA(u) 



a + "" 



2 / 

This model is the Variance Gamma stochastic volatility model described by Madan and 
Seneta (1990). As noted in Kiichler and Tappe (2008), the Variance Gamma distributions 
are special cases of bilateral Gamma distributions. The condition J + °° y/xnz(x)dx < 00 
holds. We can compute, for instance using the norming constant for an inverse Gaussian 
density, 

"+ 00 l.u 2 n .(3x~ 3 / 2 dx 



til 



(«) = J " exp (~\(^ + 2ax f X v _ dX = ^a) 1 ^' 1 eX p (-(2a) l ' 2 \u\) 



4. Statistical strategy 

4.1. Notations. Subsequently we denote by u* the Fourier transform of the function u 
defined as u*(y) = f e tyx u(x)dx, and by ||it||, < u, v >, u * v the quantities 

ll n l| 2 = / \u(x)\ 2 dx, 
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< „,„ >_ / „(,Mx )(fa with « - W » m d „ W _ / „(„)«(* - „)*. 

Moreover, we recall that for any integrable and square-integrable functions u,ui,U2, 

(13) (u*T(x) = 2ttu(-x) and (ui,u 2 ) = (2tt)- 1 «,^). 

4.2. The projection spaces. As we use projection estimators, we describe now the 
projection spaces. Let us define 

y{x) = ~~ and fm,j{x) = \fm,(p(mx - j), 

where m is an integer, that can be taken equal to 2 £ . It is well known (see Meyer (1990), 
p. 22) that {^ m ,j}jez is an orthonormal basis of the space of square integrable functions 
having Fourier transforms with compact support included into [— 7rm, irm]. Indeed an 
elementary computation yields 

e ixj/m 

(14) <Pmj( x ) = -^-WhW' 

V Tit 

We denote by S m such a space: 

S m = Span{v9 m 3 , j G Z} = {h G L 2 (IR), supp(/i*) C [-mir,mir]}. 
We denote by (S m ) m< =M n the collection of linear spaces, where 

M n = {!,-■ ■ ,m n } 

and m n < n is the maximal admissible value of m, subject to constraints to be precised 
later. 

In practice, we should consider the truncated spaces S$ = Span{</? . , j G Z, |j| < 
if n }, where K n is an integer depending on n, and the associated estimators. Under 
assumption (H6), it is possible and does not change the main part of the study (see Comte 
et al. (2006)). For the sake of simplicity, we consider here sums over Z. 

4.3. Estimation strategy. We want to estimate g such that 
M5) a*(x) - i ^ (X) - 9a{x) 
with 

1> A (x) = E(e ixZ *), 6 A (x) = -i^' A (x) = E(Zfe lxZ *). 
The orthogonal projection g rn of g on S rn is given by 

(16) g m = ^2a mjj (g)<p m j with a m j(g) = / <p m j(x)g(x)dx = (<p m ,j,g). 

We have at hand the empirical versions of tpA and a- 

1 n i n 

71 k=l U k=l 



Following Neumann (1997) and Neumann and Reiss (2007), we truncate 1/i/>a and set 

1 1 

iI>a(x) $a(x) 



( 17 ) 7,^-1, x I |^ A ( a; )|>^n-i/2 
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Now, for t belonging to a space S m of the collection {S m ) m ^M n -> let us define 
(18) TnW = - £ (lltll 2 " ^ / e ^^=^) , 



Consider 7 n (i) as an approximation of the theoretical contrast 



'w-st(i 

fe=l v 



The following sequence of equalities, relying on (|13p . explains the choice of the contrast: 

Therefore, we find that E(7*"'(f)) = ||i|| 2 — 2(g, t) = \\t — g\\ 2 — \\g\\ 2 is minimal when t = g. 
Thus, we define the estimator belonging to S m by 

(19) g m = Argmin iG5m 7„(t) 
This estimator can also be written 

(20) g m = y^a md <p m>j , with 6 m j = V" z£ / e lxZ k m ' 3 dx, 

ZirnA f— ' 7 ibMx) 



k=l 

or 



/ 0a 0*0 / da;. 



4.4. Risk bound of the collection of estimators. First, we recall a key Lemma, 
borrowed from Neumann (1997) (see his Lemma 2.1): 



Lemma 4.1. It holds that, for any p > 1, 



E 



1 1 



Tp A (x) MX) 

where 1/ipA is defined by p7\ ). 



'V 



\4>a(x)\ 2 p |Va(x)| 4 V ' 



Neumann's result is for p = 1 but the extension to any p is straighforward. See also 
Neumann and Reiss (2007). This lemma allows to prove the following risk bound. 

Proposition 4.1. Under Assumptions (H1)-(H2)(4)-(H3), then for all m: 

(21) E(\\g-g m )< \\g - g m \\ +K . 

where K is a constant. 

It is worth stressing that (H4) is not required for the above result. Therefore, it holds 
even for exponential decay of ipA- 

Proof of Proposition 14.11 First with Pythagoras Theorem, we have 

(22) \\g - g m \\ 2 = \\g - g m \\ 2 + \\g m - 9m\\ 2 - 
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Let 



a m,j(9) 



1 



2vrA 



e A (x) 



-dx. 



Then, using Parseval's formula and (|14p . we obtain 

_j \d m ,j ~ a m,j{9)\ = 7^ 



1 1 9m 9m\\ — ^ ~] \0'm,j ^mj(s)l 

It follows that 

m9m-9rn¥) < — ! i E 



2ttA 2 



9 A (x) 6 A (x) 



ip A (x) Mx) 



dx. 



\ ) < 

A 2 



Oa(x) 



1 



dx 



(23) 



< 



+ 



A 2 



Tp A (x) i>A{X / 

1 E\9 A (x) - 9 A (x)\ 2 ^ 
IVa(*)| 2 

H\0a(x) - e A ( x )\ 2 
i 



dx 



1 



1 



A%*(aO^A(aOrE 



2\ 



dx 



1 



Va^) Va(z) 



+ 1E[(Z A ) 2 ] 



n |V>a(z)| 2 



dx 



The Schwarz Inequality yields 



E |0 A (z) - A (z) 



1 



< E^d^AW-^A^hE 1 / 2 



Then, with the Rosenthal inequality E(|#a(^) — ^a(^)| 4 ) < cE[(Z A ) 4 ]/n 2 and by using 
Lemma |4.1( 

4\ 



E 



1 



1 



4> A (x) ip A (x) 



< 



c 



\Mx)\< 



so that 



E 1/2 (|^a(x)-0a(x)| 4 )E 



4x F l/2 



4> A (x) ip A (x) 
For the second term, we use Lemma 14.11 to get 



E 



ip A (x) 1 P A {x) 



< 



Cn 



dx < 



-i 



cE 1 / 2 [(Z 1 A ) 4 ] p 



I</a(z)|' 



I^a(x)| 4 ' 



We obtain 

(24) E(||5 m -<7 m || 2 )<^(E 1 / 2 [(Z 1 A ) 4 ] + A 2 || 5 || 2 +E[(Z 1 A ) 2 ]) / jj^dx, 

nA 2 y_ 7rm |^a(x)| 2 

where ||g||i = J|g(x)|G?x. Therefore, gathering ([22]) and (f24"j) implies the result. □ 

Remark 4.1. in papers concerned with deconvolution in presence of unknown error den- 
sities, the error characteristic function is estimated using a preliminary and independent 
set of data. This solution is possible here: we may split the sample and use the first half 
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to obtain a preliminary and independent estimator of and then estimate g from the 
second half. This would simplify the above proof, but not the study of the adaptive case. 



4.5. Discussion about the rates. Let us study some examples and use ([21]) to get a 

relevant choice of m. We have \\g — g m \\ 2 = J^^m \9*( x )\ 2 dx. Suppose that g belongs to 
the Sobolev class 

S(a, L) = {/, J \f* (x)| V + Ifdx < L}. 
Then, the bias term satisfies 

\\g-g m \\ 2 = 0(m- 2a ). 
Under (H4), the bound of the variance term satisfies 

n m dx/\ipA(x)\ 2 _ o ( m ^^ 



nA \ nA 

The optimal choice for m is 0((nA) 1 /( 2/3A+2a+1 ) and the resulting rate for the risk is 
{nA)~ 2a ^ 2 ^ A+2a+l ). It is worth noting that the sampling interval A explicitely appears 
in the exponent of the rate. Therefore, for positive 0, the rate is worse for large A that 
for small A. 

• Let us consider the example of the compound process. In this case (5 = 0, the upper 
bound of the mean integrated squared error is of order 0((nA) _2a /( 2a+1 )), if g belongs to 
the Sobolev class S(a,L). Note that if g is analytic i.e. belongs to a class 



Mn,Q) = {f,j (e /x + e-^) 2 \f*(x)\ 2 dx < Q}, 
then the risk is of order 0(ln(nA)/(nA)) (choose m = 0(ln(nA))). 

• For the Levy Gamma process, we have a more precise result since we have 

a? A 3 
\Mu)\ = 7-5- — 9NflA /9 ' 9*{x) 



(a 2 + u 2 )@ A / 2 ' a — ix 

Therefore J^Jg* (x)\ 2 dx = 0{ m - 1 ) and J { _^ rn] dx/\i; A (x)\ 2 = 0(m 2 ^ A+1 ). The 
resulting rate is of order (nA) -1 ^ 2 ^^ for a choice of m of order 0((nA) 1 ^ 2 ^ A+2 ^). 

• For the Bilateral Gamma process with (/?, a) = (/?', a'), we have 

^ A{U)= (*1 + U ^ 9 * {x) = ^Tx-f 
Therefore J^Jg* (x)\ 2 dx = 0{m^) and J { _ mm] dx/\^ A {x)\ 2 = 0(m^ A+1 ). The 
resulting rate is of order (nA)~ 3 ^ 4,3A+ ^ for a choice of m of order 0({nA) l l^ A+ ^) . 

These examples illustrate that the relevant choice of m depends on the unknown func- 
tion, in particular on its smoothness. The model selection procedure proposes a data 
driven criterion to select m. 

• Consider now the process described in Section \3. 31 In that case, it follows from (jlip that 
I[^ m ,,m] dx /\^( x )\ 2 = Oim^ 1 / 2 eM^ 1/2 ~ S )) and J^Jg* (x)\ 2 dx = 0(m- 2S ). In 
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this case, choosing Km 1 / 2 " 5 = ln(nA)/2 gives the rate [ln(nA)] _2<5 which is thus very slow, 
but known to be optimal in the usual deconvolution setting (see Fan (1991)). This case is 
not considered in the following for the adaptative strategy since it does not satisfy (H4). 



4.6. Study of the adaptive estimator. We have to select an adequate value of to. For 
this, we start by defining the term 



/7i 



dx 



|Va(*)| 2 ' 

and the following theoretical penalty 

(26) pen(m) = K (l + E[(Zf ) 2 ]/A)-^^. 
We set 

to = arg mm {i n {gm) + pen(m)} , 
and study first the "risk" of g^. 

Moreover we need the following assumption on the collection of models M n = {!,•••, m n }, 
m n < n: 

(H7) 3e, < e < 1, mf A < Cn 1 ' 6 , 

where C is a fixed constant and (3 is defined by (H4). 
For instance, Assumption (H7) is fulfilled if: 

(1) pen(m n ) < C. In such a case, we have m n < C(nA) 1 /( 2 ^ A+1 ). 

(2) A is small enough to ensure 2/3 A < 1. In such a case we can take Ai n = {1, . . . , n}. 

Remark 4.2. Assumption (HI) raises a problem since it depends on the unknown (3 
and concrete implementation requires the knowledge of m n . It is worth stressing that the 
analogous difficulty arises in deconvolution with unknown error density (see Comte and 
Lacour (2008)). In the compound Poisson model, (3 = and nothing is needed. Otherwise 
one should at least know if ipA is in a class of polynomial decay. The estimator ipA may 
be used to that purpose and to provide an estimator of (3 (see e.g. Diggle et Hall (1993)). 

Let us define 

= nzti\ Z ?\< kn ^ z ^ e%\ x ) = E(zfi lztl>kn ^ z ?) 

so that 9a = O^a + O^a and analogously 6 a = #a^ + ^A^- For an y two functions t, s in S m , 
the contrast j n satisfies: 

7n(*)-7n(s) = \\t-gf -\\s-gf -2ui 1 \t- S )-2u^(t- S ) 

4 

(27) -2£/#">(t- a ), 

i=i 



12 



F. COMTE AND V. GENON-CATALOT 



with 



z/ 2 

n 

RY 

R? 



\t) 

Ht) 
\t) 



i 



2vrA 
1 

2vrA 
1 

2vrA 
1 

2^A 
1 

2^A 
1 



t*(-*)^ 
t*(-x) 



ei\ x ) 



9 a (x 



dx, 



[MX)? 



t*(-x)(6 A (x)-e A (x)) 



(iI>a(x) ~ ipA(x))dx, 

1 1 



dx 



t*(-x)^\(i;A(x)-Mx)) 

**(-*)— 



1 



4>a(x) iPa(x) 



dx, 



i2 \x) -6 { l\x) 



(fx, 



27rA J " v " ip&(x) W>*(*)\<iH/Vn 
Using this decomposition and Talagrand's inequality, we can prove 
Theorem 4.1. Assume that assumptions (H1)-(H2)(8)-(H3)-(H7) hold. Then 

|2\ ^ ri ■ t f \\ „ „ I|2 



H\\9m - g\ 
where K is a constant. 



<C inf (\\g- gm f + P en(m))+K 1 -^, 



Remark 4.3. Assumption (H6) is satisfied for the Levy-Gamma process. For the com- 
pound Poisson process, it is equivalent to J x 4 f 2 (x)dx < +oo, where f denotes the density 
ofYi (see Section^). 

To get an estimator, we replace the theoretical penalty by: 



/ 1 n 

pen(m) = k 1 + ^(Z? 



I-7m dx /\^(x)\ 2 dx 



i=l 



11 



In that case we can prove: 

Theorem 4.2. Assume that assumptions (H1)-(H2)(8)-(H3)-(H7) hold and let g = be 
the estimator defined with fh = argmin m6j v( n (7 n (3 m ) + pen(m)). Then 



E(\\g - g\\ 2 ) < C inf (|| 5 - g m \\ 2 + pen(m)) + K' A 

m£M„ 



, ln 2 (n) 



n 



where K' A is a constant depending on A (and on fixed quantities but not on n). 

Theorem 14.21 shows that the adaptive estimator automatically achieves the best rate 
that can be hoped. If g belongs to the Sobolev ball S(a,L), and under (H4), the rate is 
automatically of order 0((nA)~ 2a /( 2/3A+2a+1 )). See Section 4.5. 

Remark 4.4. (1) It is possible to extend our study of the adaptive estimator to the 
case V'A having exponential decay. Note that the faster \ip A \ decays, the more 
difficult it will be to estimate g. 
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(2) Few results on rates of convergence are available in the literature for this problem. 
The results of Neumann and Reiss (2007) are difficult to compare with ours since 
the point of view is different. 

5. Proofs 



5.1. Proof of Theorem 14.11 Writing that 7 n (<7m) + pen(m) < J n (g m ) + pen(m) in view 
of ([2"7]1 implies that 

4 

\\9m-g\\ 2 < hm ~ g\? + 2v£Xgrn ~ 9m) + 2v£> (fira ~ g m ) + 2^ R^X^m ~ 9m) 



i=l 



+pen(m) — pen(m). 
Let us take expectations of both sides and bound each r.h.s. term. 



\E(^Xg m ~ g m ))\ < ^E(||(7 m -^|| 2 ) + 16E 



sup |4 1} (i)| 2 
teS m +Srh,\\t\\ =1 



1 



1 



< -E{\\g - g m \\ 2 ) + -\\g - g, 



+16E sup W^Xtfl 2 — Pi(m, m) J + 16E(pi(m,m)). 
ytsS mV ^,,||t||=i J . 

(2) 



The same kind of bounds are obtained for v\ ' and the residuals leading to 
^E(\\g m -gf) < H|| 5 _ 5m ||2 + 16 e[ sup |^«(*)| 2 - m!) 

+16E j sup |4 2) (*)| 2 -P2(m,rh) J 

\ tSSmVmill^l^l / I 



+i6 y e f 



sup \R$(t)\ 2 -pi(m, m) 



+ 16 J> sup _ |2Z«(t)| 2 

(28) +pen(m) + K(48p\(m, m) + 16p2(m, m) — pen(m)). 
Next, definition of pen(.) comes from the following constraint: 

(29) 48pi(m, m) + 16p2(m,m') < pen(m') +pen(m). 
This leads to 

pen(m) + E(48pi(m, m) + 16p2(m, m) — pen(m)) < 2pen(m). 
First, we apply Talagrand's Inequality recalled in Lemma [6.1l to prove the following result: 
Proposition 5.1. Under the assumptions of Theorem \4-l\ define 

/7r(mVm') 
\iP A (x)\- 2 dx)/(irnA 2 ), 
-7r(mVm') 
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then 
(30) 



E E ( sup \vW(t)\ 2 - Pl (m,m')) <£. 



Next we prove: 

Proposition 5.2. Under the assumptions of Theorem \4-l\ define p2(m,m') = if —a + 
(3A < and p2(m,m') = {fl^mvm') \' i pA(x)\~ 2 dx)/n otherwise. Then 



(31) 



E sup Wi 2) (t)\ 2 -p 2 {m,rh)\ <- 

VteS roVA ,||t||=i J n 



For the residual terms, two type of results can be obtained. 
Proposition 5.3. Under the assumptions of Theorem \4-l\ for i = 1,2, 

C 



E 



sup [R^(t)] 2 - Pl (m,m) < 



nA' 



and 



Proposition 5.4. Under the assumptions of theorem \4-l\ for i = 3,4 

ln 2 (n) 



E sup [i?^)] 2 <c- 



nA 



Then the choice pen(m) given by (|26p gives, following (|28p and ([29 



^E(||^ - 5 || 2 ) < ~\\g - <? m || 2 + 2pen(m) + C^-^> 
4 4 nA 



which is the result. □ 

5.2. Proof of Proposition [57TI Let 



zl 



{ \z\<k n sfE} 

2vrA 



and notice that 



1 n 

,W(i) = -j;[, t (2f)-E( Wt (Zf))]. 



fc=i 



To apply Lemma 16-H we compute M\,H\ and v\ defined therein. First, we have 



E sup 

Vfces m ,||t|| 



=1 / Viez 



E 



1 



< 



2vrA 2 
2vrnA 2 



dx 
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where $^,(m) is defined in (i25j) . We can take, for m* = m V m! , 
Then it is easy to see that if ||t|| = 1 and t £ S m *, then 



A:,, 



2vrVA 



dx < 



2vrVA 



$^(m*) :=Mi 



Lastly, for t G 5 m , ||i|| = 1, t = J2jez,tm,jPi 



t*(-u)f» 

<j(- M )<jfe(") 



■dudv 



dudv. 



Denoting by 
(32) 

we obtain: 



/ lA (n)=E[e^ A (Z 1 A l |ZiA| < fcnv/s ) 



Var(o; t (Z 1 A )) < < 



1 



(2vrA) 2 
2vrA 2 



E 



"a(' u ~~ U J — ; — — ; ; — dudv 



■7rm,7rm ■ 



/i^(u-u) 



i/ja(u)^a(-v) 

2 \ V2 



1/2 



^a{u)^^{-v) 



dudv 



where the last equality follows from the Parseval equality. Next with the Schwarz inequal- 
ity and the Fubini theorem, we obtain 

\ 1/2 



VarMZf)) < 



1 



2vrA 2 
1 

2vrA 2 



[— TTm,irm] 2 



\h* A (u-v)\< 

IV-a(«)| 4 



dudv 



[—Trm,irm] 



du 



4>a{u) 



\ 1/2 

du I \h* A (z)\ 2 dz\ 



< 



y/JZn*°/\M*)\ A \\h* A \ 



2vrA 



A 



Now we use the following Lemma: 
Lemma 5.1. Under the assumptions of Theorem \4.1 

\\h* A \\/A < 2y^F (J x 2 g 2 (x)dx + E[(2 

Thus, under (H5), £ is finite. We set 



A\2iii„i|2 



1/2 



:=£• 



2vrA 
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Therefore, setting e 2 = 1/2, 

Pl (m,m') = 4E[(Zf ) 2 /A]M^(= 2(1 + 2e 2 )H 2 ). 
Using (H4) and the fact that E[(Z 1 A ) 2 /A] is bounded, we find 

E| sup \vjp(t)\ 2 -pi(m,m')\ < c | K) 2 ^ 2 
VtGS m *,||t||=i J, I nA 



^ n 2 A 



Here if = K(c^, C^). Moreover, we take 

(33) k n = K'Vn/((2pA + 3) ln(n)) 

and we obtain 



V E( sup [uW(t)] 2 -pi(m,m')\ < 



K" 
nA' 



/ 77^^ ^ / l9*W| 2 (l+^) ali± ^ <** < 4" / l5*(x)| 2 (l+x 2 ) a dx < 



(2) 

5.3. Proof of Proposition 15.21 The study of v n is slightly different. 

E sup |^(t)| 2 < T2 / ) L dx=— / * l , J ' (fa. 

With assumptions (H4) and (H5), we can see that if —a + (3 A < 0, then 

In that case, we simply take p2(m,mf) = and write 

e( sup [^WlW sup [vV] 2 (t))<^. 

V*eS m vm,||*||=i / \*6S m „,||4||=i / n<%, 

Now we study the case —a + (3 A > and find the constants H = H2,v = i>2,e = £2 to 
apply Lemma IBTTl Consider 

u t [z) = (1/2ttA) j e izu t*(-u){8 A (u)/[4; A (u)} 2 }du. 

As 

rnm I „* f„} |2 r 

rdx < 4 m -2a+2/3A 



™ |g*(x)| 2 ^ < L 
.nm I^A(a;)| 2 ~ c 2 



we take 



2 _ L ( m ^)-2a+2/3A 
2 ~ 2vrc 2 n 



Next, we have 

M 2 = V^H 2 
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and we use the rough bound v 2 = nH\. Moreover, we take e| = (— 2a+2/3A+2) ln(m*) / K\. 
There exists mo, such that for m* > uiq, 

2(1 + 2e 2 .)F 2 2 < ^{m*)/n. 

We set p2(m,m') = $^(m*)/ra. Introducing 



W n (m, m) 

we find that 

m,m')) = 



sup \u^'\ 2 (t) — p 2 {m,m') 
teS myrn ,,\\t\\=l 



Y E(W Tl (m ) m')) + Y HW n {m,m')) 

m'\m*<mo m'\m*>mo 

< Y, ^ SU P WiP(t)\ 2 -2(l + 2e 2 )H 2 ] + ) 

rn'\rn*<mo m " 1 

+ ^ |p 2 (m,m , )-2(l + 2e2)if 2 2 | 

m'|m*<mo 

+ £ E([ sup _ k 2 )(i)| 2 - 2(1 +2^)fl|] 

m'\m*>m teSm * <^~ 1 



Therefore 



V K(W n (m,m')) < 2 V E([ sup |^ 2 )(t)| 2 - 2(1 + 2e 2 )^|] + ) 

te5 m *,||t||=l 



+ ^ |p 2 (m,m / )-2(l + 2e 2 )if 2 | 

m'|m*<mo 



< 2 £ E([ sup _ \u^{t)\ 2 -2{l + 2e 2 )H 2 ] + ) + 

m'eMn ie5 m*. II*!!- 1 

Talagrand's Inequality again can be then applied and gives that 



C(mo) 



n 



Y E([ sup |z,( 2 )(t)| 2 -2(l + 26 2 ) J ff 2 ] + )<^. 
i'eM„ *eS m *,||t||=l 



The result for i/„ in this case follows then by saying as for v^ 1 that 



(i) 



K(W n (m,m))< Y ®(W n (m,m')). 



5.4. Proof of Proposition [5731 First define Q(x) = tt\(x) H ^(a;) with 
ni(x) = {\e A (x)-9 A (x)\ <8E 1 / 2 [(Z 1 A ) 2 ](log 1 / 2 ( n )n- 1 /2j ) 



n 2 (x) 



i 



i 



Then split: R^\t) = R { n' 1] {t) + i#' 2) (t) where 



< l/Oog 1 / 2 ^)^!^^)! 2 ) 
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(1 2) 

and Rn ' (t) the integral on the complement of Q(x). 



E 



sup \R^(t)\ 2 <2E sup \R^\t)\ 2 +2E sup |i# ,2) (*)| S 



t£5 m vT?ull*l| — 1 



t£S' mV ^ l ,||t||— 1 



E sup 

\tes mVA ,||t||=i 



< 



2vrA 2 



E 



7r(mVm) 



— 7r(mVm) 



|^a(x)-0a(x) 



< 8(E[(Zfn/A) E 
27rnA 



w ( mV ™) 2lJ dx 
rT 2u) 



4E[(Z 1 A ) 2 ] 1EV <^(mVm) 



IVa(*)| z 



< 



7rA 



4E(- 



nA 



< E(pi(m, 



' -jr(mVm) 

under the condition —2a; + (1 — e) < 0. Therefore we choose a; = (1 — e)/2. Note that if 
(3 = the decomposition is useless and the residual is straightforwardly negligible. 
On the other hand, Lemma (|4.ip yields: 



E l/4 



1 



?/aO) ^a(x) 



< 



n|^A(^)| 4 



Now, we find 



E sup |i#' 2) (*)| 2 
V65 mn ,||t||=l 



< 



< 



< 



1 



2^A 2 



7vm n . 



dx 



CTE 1 /4[(^) S 



2vrn 2 \^a(x)\ 4 
CE 1 /4[(z i A ) 8 ]n 2 ( 1 - £ )+ 1 - fe C A 



< if p(0( x ) c ) < n - 2b and 2(1 - e) - b < 0. 



n 



We take 6 = 2(1 — er). In fact, 

P(ft(x) c ) < P(Oi(x) c ) + P(0 2 (x) c ). 
We use the Markov Inequality to bound F(Q,2{x) c ): 

P(Q 2 (x) c ) < log p (n)n 2p "\ip A (x)\* p E ' ' ' 



4> A (x) ^a(x) 



2p> 



< log p {n)n 2pu) - p . 

The choice of p is thus constrained by 2pu> — p = — p(l — 2u>) < —4(1 — s) that is p > 
4(l-s)/e, e.g. p = 5(l-e)/e. 

We use the decomposition of #aOe) = 8 A \ X ) + 0^\x) with 



E[(Zf)s 



8 v / log(n) 
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We use the Bernstein Inequality to bound P(Oi(x) c ). If Xi, . . . ,X n are i.i.d. variables 
with variance less than v 2 and such that \Xi\ < c, then for S n = Y^!i=i we have: 

/ r?f 2 /2 

F(\S n -E(S n )\ > ne) < 2exp 

This yields 

P(J7i(x) c ) < P (\6 ( l\x) - e1\x)\ > 4^/e[(Z a ) 2 ] log(ra)/n 
+P (\6%\x) - ^ 2) (x)| > 4^E[(Z A ) 2 ]log(n)/n 

* "- 16/3 + ll^p^^ 



16/3 EPff^vSl 
16E[(Z A ) 2 ] log(n) 



_ 16/3 , 8 4 E[(Zf)6]log 2 (n) 



16E 3 [(Z A ) 2 ]n 2 



n 



2A 2 ' 



This gives the result of Proposition 15.31 for Rn\ The study of R$ follows the same line 
and is omitted. 



5.5. Proof of Proposition 15.41 First we study R n . 

E( sup \R^(t)A < -i-^E sup [(§ { l ) (x)-e { l\x)) t -^ldx 2 
Vte5 m „.lltll=i / 47r A teS™„. Iltll=l J 



2^A 2 7_ mn J |^a(x)| 2 

™» Var (^l A l| Z A|> fcnV /A) cfe 



< 



< 



2^A 2 J_ vma n |^A(x)p 

2vrnA;6A 4 
KE[(Z A ) 8 ] ln 6 (n) 



2+ £A 4 



using the choice of k n given by (|33j) . 
Next, 



/ \ 1 /" 7rm n 

E sup \R^(t)\ 2 ) < — \g*{x)\ 2 n\Mx)\<H/^)dx< 



tes m „,||t||=i I 27rAy_ Hmn v nA 
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If |^>aO*OI — ^ K ip/V^^ th en 

P(|^a(«)| < H' n ~ 1 ' 2 ) < P(I^a(«) - V»a(«)| < |^a(«)| - ^«" 1/2 ) 

< P(|^a(«)-^a(«)|>^|^a(«)|) 

< exp(-cn|V'A('")| 2 ) 

for some c > 0, where the last inequality follows from Bernstein's Inequality. 

Now, it follows from (H4) that |Va(^)| > c^,(l + ■u 2 )~ A ^/ 2 . Therefore, for \u\ < 7rm r , 
with mf A < Cn l ~ e by (H7), 

Moreover, with the previous remarks, exp(— cn\ipA(u)\ 2 ) < exp(— cn 6 ) and thus 

7rm n 

\g* (x)\ 2 F(\4> A (x)\ < K /y/n)dx < || 5 *|| 2 exp(-cn £ ). 

-nm n 

Therefore 

E( sup \R${t)\ 2 ) < 



nA' 



5.6. Proof of Lemma 15.11 Let us denote by Pa the distribution of Z A and define 
HA(dz) = 1ST 1 zP&{dz). Let us set fj,(dx) = g{x)dx. Equation (fT5l) states that 

if if ~r~\if 

A 4 a = A* -Pa- 

Hence, /^a = Pa- Therefore, //a has a density given by 

j g(z-y)P A (dy)=Eg(z-Z A ). 
Moreover, we have, for any compactly supported function t: 

^E(Zf t(Zf )) = J t(z)Eg(z - Z A )dz = j E(t(s + Z^)g{x)dx. 
Hence, we apply first Parseval formula: 

\\h\W 2 = J \hl(x)\ 2 dx = 2n J h 2 A (x)dx = 2nA J z 2 I lzl < knVE E 2 (g(z- Zf))dz 

< 2vrAE(|, 2 l |2| < fe ^ 2 (z-Z A )dz) 

< 2vrAE (J( x + Z^) 2 g 2 (x)dxj < 4vrAE (J (x 2 + (Z A ) 2 )g 2 (z)dz 

< 4vta(| xV(x)+E[(Z A ) 2 ]|| 5 || 2 ). 
This ends the proof. □ 
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5.7. Proof of Theorem 14. 21 Let us define the sets 



£1, 



and 



Vm G M n , 



9.9 



1 



1 



Take < fci < 1/2 and < k 2 < 1. On fii, we have, Vm G 



1 v^n T7A12 
n Z^i=l L-^i J 



<ix < A;i 



< fa 



<ix 



\*Pa(x)\< 



dx 



and on f2 2 , we find 



< (2fci + 2) 



dx 



|^a(x)| 5 



and 



f/x 



< 



|Va(x)| 2 -1-2*!! 



dx 



|^a(x)| : 



\ X>ff < (1 + k 2 )E[(zt) 2 ] and E[(zf) 2 ] < -i-i f>f ] 2 . 

i=l «=l 

II follows that, on S7i n O2 := ^i,2j we can choose k' large enough to ensure 
48pi(m, m) + I6p2(m, m) + pen(m) — peh(m) < C(a, 6)pen(m). 
This allows to extend the result of Theorem I4.ll as follows: Vm £ Ai n , 

Kln 2 (n) 



E - 5 || 2 In li2 ) < C {\\g - g m \? + pen(m)) + 



nA 



Ef||p-^IncJ < 



Next we need to prove that 
(34) 

First, we prove that P(fif 2 ) < c/n 2 by proving that P(fi°) < c/re 2 and P^) ^ c / n - 

2 



if' 



P((^i) c 



< 



< 



< 



< 



< 



E 

E E 
E 



1 



1 



X 



win 
nrn 



1 1 



i/>a(z) V>A(a;) 



dx > fci 



dx 



dx 



\^a{x)\< 



(2vrm) 



fci^(m 



p-i 



me.M, 



(fci^M)p 



E 



4j a (x) i/>a{x) 



2p 



dx 



E 3 



,m^ n p 



E 



(^(m))P 

p„>-l)-p(2/3A+l)+4p/3A+l 



E ^p 771 



2pf3A n -p 



meMn 



meMn 

< C v n 1 - p+p(1 - £} < C"n 1 ~ pe . 
As m 2 ' 9A+1 /(nA) is bounded m 2p ^ n - p = 0( n 2 P /3A/(2/3A+i)- P ) = 0(n - P /( 2/ 3A+i)). There . 
fore, choosing p = 3/e ensures that n 1- ^- 1 = n~ 2 and P(Q£) < C/n 2 . 
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On the other hand, 



p[«S] < 75 



^EP[(Zf): 



-E 



ia(^f-E[(Zff]] 



i=l 



Here the choice p = 4 gives P[J2|] = 0(l/n 2 ) with a simple variance inequality, provided 
that E[(Zf) 8 ] < +oo. 
Next, we write that 



Il9-5|| 2 = I|9-^I| 2 + Ib--^I| 2 <I|9|| 2 +E 



a~ . — as .fo)|" 



and 



. — .(g)l' 

mj raj"' 1 



9 



k=\ 



< 



fc=i 



C sup |i/«(t)| 2 + sup |^ 2 )(t)| 2 
I teSa,||t||=i teS^JItlNi 



■E ^p i4 fc )(t)i 2 } 



It follows that, E 



2 % 



2 P(^f 2 ) < c/ra, and for k = 3,4, 



E f sup |4 fc )(t)| 2 l^ 2 ] < E ( sup \R^(t)\ 2 ) < C/n 
\ie%,||t||=i ' / \teSm„,||t||=i / 



as it has been proved previously. Lastly, 
E 



sup \^{t)\ 2 l n c \ < e( sup {|i/«(t)| 2 -pen(a)} ] 



+E (^pen(m)ln^ 

< c( i +nP (nf ))<^ 

using the proof of Theorem 14. 1 1 and the fact that pen(.) is less than O(n). The same line 
can be followed for the other terms. 

6. Appendix 

Lemma 6.1. Let Yi, . . . , Y n be independent random variables, letv ni y{f) = (l/ n ) Yl^iifO^i)' 
E(_f(Yj))] and let J 7 be a countable class of uniformly bounded measurable functions. Then 
for e 2 > 

98M 2 



E 



sup|^,y(/)| 2 -2(l + 2e 2 )F 2 



~ K x \n K in 2 C 2 {e 2 ) 



e 7V2 M 
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with C(e 2 ) = Vl + e 2 — \,K\ = 1/6, and 




This result follows from the concentration inequality given in Klein and Rio (2005) and 
arguments in Birge and Massart (1998) (see the proof of their Corollary 2 page 354). It 
can be extended to the case where T is a unit ball of a linear space. 
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